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Abstract: Motivated by applications of distributed linear estimation, distributed control 
, and distributed optimization, we consider the question of designing linear iterative algo- 

' rithms for computing the average of numbers in a network. Specifically, our interest is 

in designing such an algorithm with the fastest rate of convergence given the topological 
constraints of the network. As the main result of this paper, we design an algorithm with 
^ , the fastest possible rate of convergence using a non-reversible Markov chain on the given 

' network graph. We construct such a Markov chain by transforming the standard Markov 

chain, which is obtained using the Metropolis-Hastings method. We call this novel trans- 
' formation pseudo-lifting. We apply our method to graphs with geometry, or graphs with 

doubling dimension. Specifically, the convergence time of our algorithm (equivalently, the 
mixing time of our Markov chain) is proportional to the diameter of the network graph 
and hence optimal. As a byproduct, our result provides the fastest mixing Markov chain 
' given the network topological constraints, and should naturally find their applications in 

O ■ the context of distributed optimization, estimation and control. 

Keywords and phrases: consensus, lifting, linear averaging, Markov chain, non-reversible, 
^ , pseudo-lifting, random walk. 
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The recently emerging network paradigms such as sensor networks, peer-to-peer networks and 



On . surveihance networks of unmanned vehicles have led to the requirement of designing distributed, 

' iterative and efficient algorithms for estimation, detection, optimization and control. Such algo- 

rithms provide scalability and robustness necessary for the operation of such highly distributed 
^ ' and dynamic networks. In this paper, motivated by applications of linear estimation in sensor 

• networks \16\ [6l [23l I31j . information exchange in peer-to-peer networks \20\ [26] and reaching 

consensus in unmanned vehicles [TH] , we consider the problem of computing the average of num- 
bers in a given network in a distributed manner. Specifically, we consider the class of algorithms 
for computing the average using distributed linear iterations. In applications of interest, the rate 
of convergence of the algorithm strongly affects its performance. For example, the rate of con- 
vergence of the algorithm determines the agility of a distributed estimator to track the desired 
value [6] or the error in the distributed optimization algorithm [27]. For these reasons, designing 
algorithms with fast rate of convergence is of a great recent interest [G] [3l [10] and the question 
that we consider in this paper. 

A network of n nodes whose communication graph is denoted by G = (y,E), where V = 
{!,..., n} and E = : i and j can communicate}. Each node has a distinct value and 
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our interest is designing a distributed iterative algorithm for computing the average of these 
numbers at the nodes. A popular approach, started by Tsitsiklis [31], involves finding a non- 
negative valued n x n matrix P = [Pij] such that 

(a) P is graph conformant, i.e. if ^ E then Pij = 0, 

(b) l^P = 1^ , where 1 = [1] is the (column) vector of all components 1, 

(c) P*x a^avcl as t ^ oo for any x G , where Xave = {J27=i ^i) 

This is equivalent to finding an irreducible, aperiodic random walk on graph G with the uniform 
stationary distribution. 

The quantity of interest, or the performance of algorithm, is the time it takes for the algorithm 
to get close to a^avcl starting from any x. Specifically, given P, define the e-computation time of 
the algorithm as 



It is well-known that T^iP) is proportional to the mixing time, denoted as Tl{P), of the random 
walk with transition matrix P. Thus, the question of interest in this paper is to find a graph 
conformant P with the smallest computation time or equivalently a random walk with the 
smallest mixing time. Indeed, the question of designing a random walk on a given graph with 
the smallest mixing time in complete generality is a well known unresolved question. 

The standard approach of finding such a P is based on the method of Metropolis [25] and 
Hastings jl2j . This results in a reversible random walk P on G. The mixing time 'H{P) is known 
to be bounded as 



where ^{P) denotes the conductance of P. Now, for expander graphs the resulting P induced 
by the Metropolis-Hastings method is likely to have $(P) = 6(1) and hence the mixing time 
is O(logn) which is essentially the fastest possible. For example, a random walk P = [1/n] on 
the complete graph has $(-P) = 1/2 with mixing time 0(1). Thus, the question of interest is 
reasonably resolved for graphs that are expanding. 

Now the graph topologies arising in practice, such as those in wireless sensor network deployed 
in some geographic area [SKTO] or a nearest neighbor network of unmanned vehicle [30], do possess 
geometry and are far from being expanders. A simple example of graph with geometry is the ring 
graph of n nodes as shown in Figure [TJ The Metropolis-Hastings method will lead to Pi shown 
in Figure dl^a). Its mixing time is O(n^logn) and no smaller than J7(n^) (e.g. see [1]). More 
generally, the mixing time of any reversible random walk on the ring graph is lower bounded 
by 0(n2) [29] for its mixing time. Note that the diameter of the ring graph is n and obviously 
no random walk can mix faster than the diameter. Hence, apriori it is not clear if the fastest 
mixing time is it? or n or something in between: that is, does the smallest mixing time of the 
random walk on a typical graph G scale like the diameter of G, the square of the diameter or a 
power of the diameter in (1,2)? 

^Lemma [8] states the precise relation. Known terms, such as mixing time, that are used here are defined in 
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Fig 1. 



(a) : P\ on the ring graph G\. 



(b) : P2 on the lifted ring graph G2. 



In general, in most cases of interest the mixing time of the reversible walk P scales like 
1/<J>^(P). The conductance ^(-P) relates to diameter D of a graph G as 1/^{P) > D. Therefore, 
in such situations the mixing time of random walk based on the Metropolis-Hastings method 
is likely to scale like -D^, the square of the diameter. Indeed, Diaconis and Saloff-Coste [2S] 
established that for a certain class of graphs with geometry the mixing time of any reversible 
random walk scales like at least and it is achieved by the Metropolis-Hastings' approach. 
Thus, reversible random walks result in rather poor performance for graphs with geometry i.e. 
their mixing time is far from our best hope, the diameter D. 

Motivated by this, we wish to undertake the following reasonably ambitious question in this 
paper: is it possible to design a random walk with mixing time of the order of diameter D for 
any graph? We will answer this question in affirmative by producing a novel construction of 
non-reversible random walks on the lifted version of graph G. And thus, we will design iterative 
averaging algorithms with the fastest possible rate of convergence. 

1.1. Related work 

In an earlier work, Diaconis, Holmes and Neal [9] introduced a construction of a non-reversible 
random walk on the ring (and more generally ring-like) graph. This random walk runs on the 
lifted ring graph, which is described as G2 in Figure [Hb). Here, by lifting we mean making 
additional copies of the nodes of the original graph and adding edges between some of these 
copies while preserving the original graph topology. Figure mb) explains the construction in 
[9] for the ring graph. Note that each node has two copies and the lifted graph is essentially 
composed of two rings: an inner ring and an outer ring. The transition on the inner circle forms a 
clockwise circulation and the transition on the outer circle forms a counterclockwise circulation. 
And the probability of changing from the inner circle to the outer circle and vice versa are 
1/n each time. By defining transitions in this way, the stationary distribution is also preserved; 
i.e. the sum of stationary distributions of copies is equal to the stationary distribution of their 
original node. Somewhat surprisingly, the authors [9] proved that this non-reversible random 
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walk has the linear mixing time 0*(n)I^ Thus, effectively (i.e. up to logn factor) the mixing 
time is of the order of the diameter n. It should be noted that because lifting preserves the graph 
topology and the stationary distribution, it is possible to simulate this lifted random walk on 
the original graph by expanding the state appropriately, with the desired output. Equivalently, 
it is possible to use a lifted random walk for linear averaging by running iterations with extra 
statesH 

The following question arose from the work of [^: given graph G and random walk P on G, 
is it possible to design a non-reversible random walk on the lifted version of G which mixes 
subsequently faster than PI Can it mix in 0(D)? This question was addressed in a subsequent 
work by Chen, Lovasz and Pak [7] . They provided an explicit construction of a random walk on 
a lifted version of G with mixing time 0*{1/^{P)). Further, they showed that, under the notion 
of lifting (implicity) introduced by [S] and formalized in [7], it is not possible to design such a 
lifted random walk with mixing time smaller than r2(l/$(P)). 

Now note that 1/$(P) can be much larger than the diameter D. As a simple example, consider 
a ring graph with P exactly the same as that in Figure ^a), but with a difference that for 
two edges the transition probabilities are 6{n) instead of 1/4 (and the transition probabilities 
of endpoints of these edges appropriately adjusted). Then, it can be checked that 1/$(P) is 
0(n/5(n)) which can be arbitrarily poor compared to the diameter n by choosing small enough 
5{n). A more interesting example showing this poorer scaling of 1/#(P) compared to diameter, 
even for the Metropolis-Hastings' style construction, is presented in Section [3] in the context of 
a "Barbell graph" (see Figure [2]). Thus, the lifting approach of [9l [7] can not lead to a random 
walk with mixing time of the order of diameter and hence the question of existence or design of 
such a random walk remains unresolved. 

As noted earlier, the lifted random walk can be used to design iterative algorithms (for 
computing an average) on the original graph since the topology of the lifted graph and the 
stationary distribution of the lifted random walk "projects back" onto those of the original 
graph and the random walk respectively. However, running algorithm based on lifted random 
walks on the original graph requires additional states. Specifically, the lifted random walk based 
algorithm can be simulated on the original graph by running multiple threads on each node. 
Specifically, the number of operations performed per iteration across the network depends on 
the siz^ of the lifted walk (or graph). In the construction of [7j for a general graph, this issue 
about the size of the lifted walk was totally ignored as the authors' interest was only the time 
complexity, not the size. Therefore, even though time may reduce under the construction of [7] 
the overall cost (equal to the product of time and size) may not be reduced; or even worse, it 
may increase. 

Therefore, from the perspective of the application of iterative algorithms we need a notion of 
lifting that leads to a design of a random walk that has (a) mixing time of the order of diameter 
of the original graph and (b) the smallest possible size. 

^For a function / : N ^ K+, 0*{f{n)) ■- 0(/(n)poly(logn)). 
^The details are given in Section [S] 

''in this paper, the size of a random walk (resp. graph) is the number of non-zero entries in its transition matrix 
(resp. number of edges in the graph). 
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1.2. Our contributions 

In this paper, we answer the above stated question affirmatively. As noted earlier, the notion of 
lifting of [9l [7] can not help in answering this question. For this reason, we introduce a notion 
of pseudo-lifting which can be thought of as a relaxation of the notion of lifting. Like lifting, 
the notion of pseudo-lifting preserves the topological constraints of the original graph. But the 
relaxation comes in preserving the stationary distribution in an approximate manner. However, 
it should be noted that is still possible to use the pseudo-lifted random walk to perform the 
iterative algorithm without any approximation errors (or to sample objects from a stationary 
distribution without any additional errors) since the stationary distribution of pseudo-lifting 
under a restricted projection provides the original stationary distribution exactly. Thus, opera- 
tionally our notion of pseudo-lifting is as effective as lifting. 

First, we use pseudo-lifting to design a random walk with mixing time of the order of diameter 
of a given graph with the desired stationary distribution. To achieve this, we first use the 
Metropolis-Hastings method to construct a random walk P on the given graph G with the 
desired stationary distribution. Then, we pseudo-lift this P to obtain a random walk with mixing 
time of the order of diameter of G. This approach is stated as Theorem [5j 

As discussed earlier, the utility of such constructions lies in the context of graphs with geom- 
etry. The graphs with (fixed) finite doubling dimension, introduced in O [I3l [HI [8] , serve as an 
excellent model for such a class of graphs. Roughly speaking, a graph has doubling dimension p 
if the number of nodes within the shortest path distance r of any node of G is 0{rf) (i.e. poly- 
nomial growth of the neighborhood of a node). We apply our construction of pseudo- lifting to 
graphs with finite doubling dimension p to obtain a random walk with mixing time of the order 
of diameter D. In order to address the concern with expansion in the size of the pseudo-lifted 
graph, we use the geometry of the original graph explicitly. Specifically, we reduce the size of 
the lifted graph by a clever combination of clustering, geometry and pseudo-lifting. This formal 
result is stated as follows and its proof is in Section 16.31 

Theorem 1 Consider a connected graph G with doubling dimension p and diameter D. It is 
possible to explicitly construct a pseudo-lifted random walk on G with mixing time 0{D) chain 
and size O ( Dn^ ) . 



As a specific example, consider a d-dimensional grid whose doubling dimension is d. The 



which is equal to diameter, and in terms of cost per iteration it is lossy by a relatively small 



In general, we can use pseudo-lifting to design iterative algorithms for computing the average 
of given numbers on the original graph itself. We describe a precise implementation of such an 
algorithm in Section [5l The use of pseudo-lifting, primarily effective for a class of graphs with 
geometry, results in the following formal result whose proof is in Section 15.21 
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Theorem 2 Consider a given connected graph G with diameter D and each node with a distinct 
value. Then, (using a pseudo-lifted random walk) it is possible to design an iterative algorithm 
whose e-computation time is = O* (^Dlog . Further, ifG has doubling dimension p, then the 
network-wide total number of operations (essentially, additions) per iteration of the algorithm is 
O (^Dn^~"^y 

As a specific example, recall a d-dimensional grid with doubling dimension d and diameter 
^i/d^ The Metropolis-Hastings method will have mixing time Q (ji'^^'^^ and per iteration number 

of operations 0(n). Therefore, the number of total operations is O (^n^^d^ (even the randomized 
gossip algorithm of [6] will have this total cost). Compared to this. Theorem [2] implies the number 

of iterations would be O (ji^^^^ and per iteration cost would be O ^n^^^W^^. Therefore, the 

total cost is O ^^^d{d+i)^ which is essentially close to O ^n^^^^'^'^ for large d. Thus, it strictly 

improves performance over the Metropolis-Hastings method by roughly n^/"^ factor. It is worth 
nothing that no algorithm can have the number of total operations less than Q ^n^+^/'^j and 

the number of iterations less than $7 (n^^'^]. 



For the application of interest of this paper, it was necessary to introduce a new notion of 
lifting and indeed we found one such notion, i.e. pseudo-lifting. In general, it is likely that for 
certain other applications such a notion may not exist. For this reason, we undertake the question 
of designing a lifted (not pseudo-lifted) random walk with the smallest possible size since the 
size (as well as the mixing time) decides the cost of the algorithm that uses lifting. Note that 
the average-computing algorithm in Section [5] can also be implemented via lifting instead of 
pseudo-lifting, and the size of lifting leads to the total number of operationjfj. As the first step, 
we consider the construction of Chen, Lovasz and Pak [7j. We find that it is rather lossy in its 
size. Roughly speaking, their construction tries to build a logical complete graph topology using 
the underlying graph structure. In order to construct one of edges of this complete graph 
topology, they use a solution of a flow optimization problem. This solution results in multiple 
paths between a pair of nodes. Thus, in principle, their approach can lead to a very large size. In 
order to reduce this size, we use two natural ideas: one, use a sparse expander graph instead of 
the complete graph and two, use a solution of unsplittable flows [19]. Intuitively, this approach 
seems reasonable but in order to make it work, we need to overcome rather non-trivial technical 
challenges. To address these challenges, we develop a method to analyze hybrid non-reversible 
random walks, which should be of interest in its own right. The formal result is stated as follows 
and see Section [6] for its complete proof. 

Theorem 3 Consider a given connected graph G with a random walk P. Then, there exists a 
lifted random walk with mixing time 0*(1/<&(P)) and size 0*{\E{P)\/(^[P)), where 

E{P)={{i,j) -.Pij^O orPji^O}. 



^One can derive its explicit performance bound as Theorem [21 It turns out that Hfting is worse than pseudo- 
lifting in its performance, but it is more robust in its construction. 
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Note that the hfted random walk in [7] has size r2(?i^/<I>(P)), hence our hfting construction leads 
to the reduction of its size by G(n) factor when G is sparsqj- Finally, we note that the methods 
developed for understanding the expander-based construction (and proof of Theorem [3]) can be 
useful in making pseudo-lifting more robust, as discussed in the Section [71 



2. Preliminaries and Backgrounds 
2.1. Key notions and definitions 

In this paper, G = {V,E) is a given graph with n nodes i.e. \V\ = n. We may use V{G) to 
represent vertices of V of G. P always denotes a transition matrix of a graph conformant random 
walk (or Markov chain) on G with its stationary distribution vr i.e. Pij > only if € E, and 
tt'^P = vr^. We will use the notion of "Markov chain" or "random walk" depending on which 
notion is more relevant to the context. The reverse chain P* of P is defined as: P,*, = vroPoj/vrj 
for all G E. We call P reversible if P = P*. Hence, if vr is uniforniil, P is a symmetric 

matrix. The conductance of P is defined as 

■ ^i€S,j£V\S'^iPij 

^^''^ = '^^ ^iSMv\s) ' 

where tt{A) = EieA^«- 

Although there are various (mostly equivalent) definitions of Mixing time that are considered 
in the literature based on different measures of the distance between distributions, we primarily 
consider the definition of Mixing time from the stopping rule. A stopping rule T is a stopping 
time based on the random walk of P: at any time, it decides whether to stop or not, depending 
on the walk seen so far and possibly additional coin flips. Suppose, the starting node is 
drawn from distribution a. The distribution of the stopping node is denoted by and 
call r as a stopping rule from a to r. Let 7i{a,T) be the infimum of mean length over all such 
stopping rules from o" to r. This is well-defined as there exists the following stopping rule from 
a to r: select i with probability Ti and walk until getting to i. Now, we present the definition of 
the (stopping rule based) Mixing time Ti.. 

Definition 1 (Mixing time) 7i = max„7i{a,TT). 

Therefore, to bound 7i, we need to design a stopping rule whose distribution of stopping nodes 
is vr. 



2.2. Metropolis-Hastings method 

The Metropolis-Hastings method (or Glauber dynamics [18]) has been extensively studied in 
recent years due to its local constructibility. For a given graph G = {V, E) and distribution vr 
on V , the goal is to produce a random walk P on G whose stationary distribution is vr. The 
underlying idea of the random walk produced by this method is choosing a neighbor j of the 

®A graph G = {V, E) is sparse if \E\ = 0{\V\). 
''tt is uniform when vTi — l/n,Vi. 



8 



Jung-Shah-Shin 



current vertex i at uniformly random and moving to j depending on the ratio between VTj and 
TTj. Hence, its explicit transition matrix P is following: 



where di is a degree of vertex i and d = maxj di. It is easy to check that n P = tt and P is 
reversible. 

2.3. Lifting 

As stated in the introduction, motivated by a simple ring example of Diaconis et al. [9], Chen 
et al. [3 use the following notion of lifting. 

Definition 2 (Lifting) A random walk P on graph G = (V, E) is called a lifting of random 
walk P on graph G = (y, E) if there exists a many-to-one function f : V ^ V such that the 
following holds: (a) for any u^v , {u,v) £ E only if {f{u),f{v)) G E; (h) for any u^v , 
Tr{u) = 7f{f~^{u)) and Q{u,v) = Q{f~^{u), f~^{v)). Here Q and Q are ergodic flow matrices 
for P and P respectively. 

Here, the ergodic flow matrix Q = [Qij] of P is defined as: Qij = iTiPij. It satisfies: J2i,j Qij = 1) 
J2i Qij = J2i Qji and J2i Qij = "^j ■ Conversely, every non-negative matrix Q with these properties 
defines a random walk with the stationary distribution vr. In this paper, P means a lifted (or 
pseudo-lifted) random walk of P. Similarly G, V, E and vf are the lifted (or pseudo-lifted) 
versions of their original one. 

Chen et al. [7] provided an explicit construction to lift a given general random walk P with 
almost optimal speed-up in terms of mixing time. Specifically, they obtained the following result. 

Theorem 4 ([7]) For a given random walk P, it is possible to explicitly construct a lifted ran- 
dom walk of P with mixing time 0*{1/^{P)). Furthermore, any lifted random walk of P needs 
at least n(l/$(P)) time to mix. 

2.4- Auxiliary backgrounds 
2.4.1. e-Mixing time 

Here we introduce a different (and related) notion of Mixing time which measures more explicitly 
how fast the random walk converges to the stationarity. The following notions, r(e),r2(e) are 
related to TC. This relation can be found in detail in the survey by Lovasz and Winkler [22]. For 
example, we will use this relation explicitly in Lemma [8j 

Now we define these related definitions of mixing time. To this end, as before consider a 
random walk P on a graph G = {V,E). Let P*(x, •) denote the distribution of the state after t 
steps under P, starting from an initial state x £ V. For the random walk of our interest, P*(x, •) 
goes to vr as t — > 00. We present the definitions based on the total variation distance and the 
X^-distance. 




if {i,j)eE 

if {i,j) i E and i / j 
\ii = 3 
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Definition 3 (e-Mixing time) Given e > 0, let r(e) and r2(e) represent e-Mixing time of the 
random walk with respect to the total variation distance and the x^-distance respectively. Then, 
they are 

T{e) = mill J t : Vx e \P\x, y) - 7r(y)| < e I , 

[ yen J 

T2(£) = min |i : Vx e -L-^ {pt{x,y) - n{y) f < e| . 

2.4-2. Additional Techniques to bound Mixing Times 

Various techniques have been developed over past three decades or so to estimate Mixing time 
of a given random walk. The relation between the conductance and the mixing time in the 
introduction is one of them. We review some of the key other techniques that will be relevant 
for this paper. 

Fill-up Lemma. Sometimes, due to the difficulty for designing such an exact stopping rule, we 
use the following strategy for bounding the mixing time TC. 

Step 1. For a positive constant e and any starting distribution cr, we design a stopping 
rule whose stopping distribution 7 is e-far from vr (i.e. 7 > (1 — £)7r). This gives the upper 
bound for i7(a", 7). 

Step 2. We bound Ti. by H{a,j) using the following fact known as fill-up Lemma in [1]: 



n < 



-'He, 



where Tie = maxo- min^>(i„£')^ W(cr, 7). 

Eigenvalue. If P is reversible, one can view P as a self-adjoint operator on a suitable inner product 
space and this permits us to use the well-understood spectral theory of self-adjoint operators. 
It is well-known that P has n = \V\ real eigenvalues 1 = Aq > Ai > A2 > • • • > A„_i > —1. The 
e-mixing time T2(e) is related as 



1 1 1 
— log^— 



r2{e) < 



where Ap = 1 — max{|Ai|, |A„_i|} and ttq = miuj vTj. The Ap is also called the spectral gap of P. 
When P is non-reversible, we consider PP* . It is easy to see that the Markov chain with PP* as 
its transition matrix is reversible. Let App* be the spectral gap of this reversible Markov chain. 
Then, the mixing time of the original Markov chain (with its transition matrix P) is bounded 
above as: 

" 2 , 1 



Me) < 



Xpp> 



■log- 



(2) 
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3. Pseudo-Lifting 

Here our aim is to obtain a random walk with mixing time of the order of the diameter for 
a given graph G and stationary distribution vr. As explained in the introduction, the following 
approach based on lifting does not work for this aim: first obtain a random walk with the desired 
stationary distribution using the Metropolis-Hastings method, and then lift it using the method 
in [7]. 

For example, consider the Barbell graph G as shown in Figure [2j two complete graphs of n/2 
nodes connected by a single edge. And, suppose vr is uniform. Now, consider a random walk 
P produced by the Metropolis-Hastings method: the next transition is uniform among all the 
neighbor for each node. For such a random walk, it is easy to check that 1/<I>(P) = rj(n^) and 
Ti = ^{n^). Therefore, the mixing time of any lifting is at least r2(n^). However, this random 
walk is ill-designed to begin with because 1/$(P) can be decreased up to 0{n) by defining its 
random walk in another way (i.e. increasing the probability of its linkage edge, and adding self- 
loops to non-linkage nodes not to change its stationary distribution). 1/^{P) is still far from the 
diameter D = 0(1) nevertheless. Hence, from Theorem 31 lifting cannot achieve 0(-D)-mixing. 




Fig 2. The Barbell graph with 12 nodes. 



Motivated by this limitation, we will use the following new notion of lifting, which we call 
pseudo-lifting, to design a 0(Z))-mixing random walk. 

Definition 4 (Pseudo-Lifting) A random walk P is called a pseudo-lifting of P if there exists 
a many-to-one function f : V ^ V , T CV with \T\ = \ V\ such that the following holds: (a) for 
any u,v^V, {u, v) & E only if{f{u), f{v)) G E, and (b) for any u £ V, 7f(/~^(u)nr) = ^7r(n)l§ 

The property (a) in the definition implies that one can simulate the pseudo-lifting P in the 
original graph G. Furthermore, the property (b) suggests that (by concentrating on the set T), it 
is possible to simulate the stationary distribution vr exactly via pseudo-lifting. Next we present 
its construction. 

3. 1 . Construction 

For a given random walk P, we will construct the pseudo-lifted random walk P of P. It may be 
assumed that P is given by the Metropolis-Hastings method. We will construct the pseudo-lifted 

*In fact, I can be replaced by any constant between and 1. 
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graph G by adding vertices and edges to G, and decide the values of the ergodic flows Q on G, 
which defines its corresponding random walk P. 

First, select an arbitrary node v. Now, for each w £ V, there exist paths Vwv and Vvw, from 
w to V and v to w respectively. We will assume that all the paths are of length D: this can be 
achieved by repeating same nodes. Now, we construct a pseudo-lifted graph G starting from G. 

First, create a new node v' which is a copy of the chosen vertex v. Then, for every node w, 
add directed paths V^y, a copy of Vwv, from w to v' . Similarly, add Vyy, (a copy of Vvw) from v' 
to w. Each addition creates D — 1 new interior nodes. Thus, we have essentially created a virtual 
star topology using the paths of the old graph by adding 0{nD) new nodes in total. (Every new 
node is a copy of an old node.) 

Now, we define the ergodic flow matrix Q for this graph G as follows: for an edge 



Q 



^vr^ if oiE{V'„ 

(l-5i)Q,, if Gi?(G), 



where 5i G [0.1] is a constant we will decide later in Q. It is easy to check that J2ijQij = 
1, Qij = Qji. Hence it defines a a random walk on G. The stationary distribution of this 
pseudo-lifting is 




■lii G iV{VWJV{VU)\{w,v'} 

2D 

if i = v' 

Given the above definition of Q and corresponding stationary distribution vf, it satisfies the 
requirements of pseudo-lifting in Definition [H if we choose 6i such that 

1/2 = 5i fl - -^V (3) 



2D 



and T = V{G); i.e. T is the set of old nodes. 



3.2. Mixing time 

We claim the following bound on the mixing time of the pseudo-lifting we constructed. 
Theorem 5 The mixing time of the random walk P defined by Q is 0{D). 

Proof. We will design a stopping rule where the distribution of the stopping node is vf, and 
analyze its expected length. At first, walk until visiting v\ and toss a coin X with the following 
probability. 

with probability — 



X 



2D 

1 with probability ^^^^20^^ 

2 with probability 1 — ^1 + ^ 

3 with probability 



2D 

Depending on the value of X, the stopping node is decided as follows. 
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o X = : Stop at v' . The probability for stopping at v' is Pr[X = 0] = which is exactly 

TTy/. 

o X = 1 : Walk a directed path P^^ , and choose an interior node of uniformly at random, 
and stop there. For a given u;, the probability for walking is easy to check vr^. There 
are D — 1 many interior nodes, hence, for an interior node i of P^y^, the probability for 
stopping at i is 

1 S 

Pr[X = 1] X TT^ X jy—^ = ^TT^ = TTi. 

o X = 2 : Stop at the end node w of P/,^ . The probability for stopping at w is 

Pr[X = 2] X Pr[walk P^] = (^1 - + x vr^ = 

o X = 3 : Walk until getting a directed path P^^, and choose an interior node of P^^ 
uniformly at random, and stop there. Until getting a directed path P^^, the pseudo-lifted 
random walk defined by Q is same as the original random walk. Since the distribution 
w G V{G) of the walk at the end of the previous step is exactly vr, it follows that the 
distribution vr over the nodes of V{G) is preserved under this walk till walking on P^^,. 
From the same calculation as the case X = 1, the probability of stopping at the interior 
node i of P^^ is vf j . 

Therefore, we have established the existence of a stopping rule that takes an arbitrary starting 
distribution to the stationary distribution vf. Now, this stopping rule has an average length 
0{D/5i): since the probability of getting on a directed path P^^ at is ^/(l — + ^) = 
Q{5i/D), the expected numbers of walks until visiting v' and getting a directed path when 
X = 2> are 0{D/5i) = 0(D) from ([3]) in both cases. This completes the proof. □ 



4. Pseudo-Lifting: use of geometry 

The graph topologies arising in practice, such as those in wireless sensor network deployed 
in some geographic area or a nearest neighbor network of unmanned vehicles [30], do possess 
geometry and are far from being expanders. A good model for graphs with geometry is a class 
of graphs with finite doubling dimension which is defined as follows. 

Definition 5 (Doubling Dimension) Consider a metric space M = {X,d), where X is the 
set of point endowed with a metric d. Given x € X, define a ball of radius r G M+ around x as 
'B{x,r) = {y £ X : d{x,y) < r}. Define 

p{x, r) = inf{i^ GN:3yi,...,yK£X, B{x, r) C uf£iB(y„ r/2)}. 

Then, the p{Ai) = snY>^^x,rm.+ Pi^^"^) ^-^ called the doubling constant of M. and \og2 p{M.) is 
called the doubling dimension of M . The doubling dimension of a graph G = (V, E) is defined 
with respect to the metric induced on V by the shortest path metric. 

For graphs with finite doubling dimension, we will design a pseudo-lifting with its efficient 
size. Recall the basic idea for the construction of the pseudo-lifting in Section [3] is creating a 



Fast averaging algorithms 



13 



virtual star topology using paths from every node to a fixed root, and the length of paths grows 
the size of the pseudo-hfting. To reduce the overall length of paths, we consider clusters of nodes 
such that nodes in each cluster are close to each other, and pick a sub-root node in each cluster. 
And then, build a star topology in each cluster around its sub-root and connect every sub-root to 
the root. This creates a hierarchical star topology (or say a tree topology) as you see the example 
of the line graph in Figure [3|^b) . Since it needs paths of short length in each cluster, the overall 
length of paths would be decreased. 

For a good clustering, we need to decide which nodes would become sub-roots. A natural 
candidate for them is the i?-net Y C V of a graph G defined as follows. 

Definition 6 {R-net) For a given graph G = {V, E), Y C V is a R-net if 

(a) For every v £V, there exists u £Y such that the shortest path distance between u,v is at 
most R. 

(h) The distance between any two y,z £Y is more than R. 

Such an i?-net can be found in G greedily, and as you will see the proof of Lemma O the 
small doubling dimension of G guarantees the existence of a good i?-net for our purpose. 



/ / 



/ r 



\ 

\ _ 

\ 

I 



o 



^■0 



(a) 



(b) 



Fig 3. For a given line graph with n nodes, (a) is the star topology which used in the construction of the pseudo- 
lifted graph in Section \3. 11 (b) is the hierarchical star topology which will he used in this section for the new 
construction of pseudo-lifting. 



4.1. Construction 

For a given random walk P, we will construct the pseudo-lifted random walk P of P using 
a hierarchical star topology. Denote vr and G = (V, E) be the stationary distribution and the 
underlying graph of P again. As the previous construction in Section 13.11 we will construct the 
pseudo-lifted graph G by extending G, and define the ergodic flow matrix Q on G, which leads 
to its corresponding random walk P. 

Given a i?-net Y, match each node w to the nearest y £ Y (breaking ties arbitrarily). Let 
Cy = {w\ w matched to y} for y £ Y . Clearly, V = Uy^yCy. Finally, for each y £ Y and for 
any w £ Gy we have paths Vwyi'Pyw between w and y of length R exactly. Also, for each y £ Y, 
there exit Vyv^Vvy between y and v of length D exactly (we allow the repetition of nodes to hit 
this length exactly). 
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Now, we construct the pseudo-lifted graph G. As the construction in Section 13. H select an 
arbitrary node v € V and create its copy v' again. Further, for each y £Y, create two copies y'l 
and 2/2- Now, add directed paths V^,y, a copy of V^y, from w to y[ and add Vy^, a copy of Vyv, 
from y[ to v' . Similarly, add V^y and V'y^ between v' , y'2 and y2, w. In total, this construction 
for G adds 2D|y| + 2Rn edges to G. Now, the ergodic flow matrix Q on G is defined as follows: 
for any of G, 



Q 



52 

2(R+D) 
(1 - 62)Q^j 



if {i,j)eE{Vi,y) or E{V'yJ 
ii{i,j)eE{V'y,)oTE{V',y) , 
if {i,j)€E{G) 

where vr(Cy) = J^weCy^-w ^2 £ [0.1] is a constant decided lateilfl. It can be checked that 

Qij = 1) Z]j Oii = J2j Qji- Hence it defines a random walk on G. The stationary distribution 
of this pseudo-lifted chain is 



2{R+D) 



2{R+D) " y^y 
1-62(1 
S2 



2{R+D) 



a t e {V{V'^y)UV{V'yJ)\{w,y[,y'2} 
i^^e{V{V'yJUV{Vl^))\W} 
if i e V{G) 
if i = v' 



2(R.+D) 

To guarantee that this chain is indeed the pseudo-lifting of the original random walk P, consider 
T = V{G) and 82-, where 



62 1 



1 



(4) 



2{R + D) 

Note that G has exactly \E\+ 2Rn + 2D\Y\ edges. 
4.2. Mixing time and Size: Proof of Theorem\^ 

We prove two Lemmas about the performance of pseudo-lifting we constructed, and they imply 
Theorem [TJ At first, we state the following result about its mixing time, and the proof can be 
done similarly as the proof of Theorem [5l 

Lemma 6 The mixing time of the random walk P defined by Q is 0{D). 

Proof. Consider the following stopping rule. Walk until visiting v' , and toss a coin X with the 
following probability. 



X = I 



with probability 2(r+d) 
with probability 2{r+d) 
with probability ^2(^r+d) 
with probability 1 — (52(1 
with probability ^2(r+d) 
with probability 2{r+d) 



2(R+D) . 



Depending on the value of X, 

^ See the equation Q and check 82 ~ 1/2. 
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o X = : Stop at v' . 

o X = 1 : Walk on a directed path Vl^^y, and choose its interior node uniformly at random, 
and stop there. 

o X = 2 : Walk until getting a directed path V'yyj, and choose its interior node uniformly at 

random, and stop there. 
o X = 3 : Walk until getting an old node in V{G), and stop there. 

o X = A : Walk until getting a directed path V'^y, and choose its interior node uniformly at 

random, and stop there. 
o X = 5 : Walk until getting a directed path Vyy, and choose its interior node uniformly at 

random, and stop there. 

It can be checked, using arguments similar to that in proof of Theorem [H that the distribution 
of the stopped node is precisely tt. Also, we can show that the expected length of this stopping 
rule is 0( ^"^^ ^) = 0{^^) = 0(D) from This is primarily true because the probability of 
getting on a directed path V^y at w is Q{62/{R + D)). □ 

Now we apply the hierarchical construction to the case of graphs with constant doubling 
dimension, and show the guarantee for the size of the pseudo-lifting in terms of its doubling 
dimension. 

Lemma 7 Given a graph G with a constant doubling dimension p and its diameter D, the 
hierarchical construction gives a pseudo-lifted graph G with its size \E\ = 0{Dn~~^). 

Proof. The property of doubling dimension graph implies that there exists an i?-net Y such 

p 1 

that \Y\ < {2D/RY (cf. [2]). Consider R = D2p+^n f+i . This is an appropriate choice because 

p 1 1 1 1 

R = D2p+^n f+i > Dn > up p+'^ > 1 (the second inequality is from n < D^). Given this, 

the size of the pseudo-lifted graph G is 

p 




p 



Since \E\ = 0{n) and D = ^{n^lP), we have that |^| = 0{Dn p+i ). □ 
5. Application: Back to Averaging 

As we introduced in the introduction, consider the following computation problem of the dis- 
tributed averaging. Given a connected network graph G = [V^ E), where V = {1, 2, . . . n}, each 
node i £ V has a value Xi G M. Then the goal is to compute the average of x = [xi] only by 
communications between adjacent nodes: 

Xave — / (5) 
n ^ 

I 

This problem arises in many applications such as distributed estimation [31] , distributed spectral 
decomposition [T7], estimation and distributed data fusion on ad- hoc networks [23], distributed 
sub-gradient method for eigenvalue maximization [5], inference in Gaussian graphical models 
and coordination of autonomous agents |15j . 
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5.1. Linear iterative algorithm 

A popular and quite simple approach for this computation is a method based on linear iterations 
|32| as follows. Suppose we are given with a graph conformant random walk P which has the 
uniform stationary distribution vr i.e. tt'^ P = vr^. The linear iteration algorithm is described as 
follows. At time t, each node i £ V has an estimate yi{t) of Xave and initially yi{0) = Xj. At time 
t = 1,2, .. . for each edge of G, node i sends value Pjiyi{t) to node j. Then each node j 
sums up the values received as its estimate at time t + 1, that is 

n 

y,{t + l) = Y,P]^yiit)■ 

i=l 

Under the condition that P is ergodic, i.e. P is connected and aperiodic, it is known that [32] 
lim y{t) = lim P*x = ( Vxj ) vr = -Vxjl = Xave'i-, where 1 = [1]. 

t-^oo t^oo / 

Specifically, as we already saw in the introduction, e-computation time T^{P) is defined as: 

I ^ave J 

The quantity T^{P) is well known to be related to the mixing time TC{P). More precisely, we 
prove Lemma El which implies 

TeiP) = O* (niP)log^y (7) 

Since each edge such that Pij > performs an exchange of values per each iteration, the 
number of operations performed per iteration across the network is at most \E\. Thus, the total 
number of operations of the linear iterations to obtain the approximation of Xave scales like 

Cs{P):=T,iP)x\E\. (8) 

Therefore, the task of designing an appropriate P with small 'H{P) is important to minimize 
both Te{P) and Ce{P). 

5.2. Linear iterative algorithm with pseudo-lifting: Proof of Theorem{^ 

We present a linear iterative algorithm that utilizes the pseudo-lifted version of a given matrix 
P on the original graph G. The main idea behind this implementation is to run the standard 
linear iterations in G = {V, E) with the pseudo-lifted chain P. However, we wish to implement 
this on G = (y, E) and not G. Now recall that G has the following property: (a) each node v 
is a copy of a node v , and (b) each edge (2, v) is a copy of edge (u, v) G E, where u, v are 
copies of u,v € V respectively. Therefore, each node v £ V can be simulated by a node v £ V 
where u is a copy of v for the purpose of linear iterations. Thus, it is indeed possible to simulate 
the pseudo-lifted version of a matrix P on G by running multiple threads (in the language of 
the computer programming) on each node of G. We state this approach formally as follows: 
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1. Given graph G = {V, E), we wish to compute the average Xave at ah nodes. For this, first 
produce a matrix P using the Metropolis-Hastings method with the uniform stationary 
distribution. 

2. Construct the pseudo-lifting P based on P as explained in Section [H This pseudo- lifted 
random walk has a stationary distribution vf on a graph G. 

3. As explained below, implement the linear iterative algorithm based on P on the original 
graph G. 

o Let t be the index of iterations of the algorithm and initially it be equal to 0. 

o For each node v £ V, maintain a number y:j;(i) at the t*'' iteration. This is maintained 
at the node v £ V where v is a copy of v. The initialization of these values is stated 
below. 

• Recall that, V contains V as its subset. Recall that they are denoted as V{G) C V, 
and each v £ G has its copy v E V{G). 

• For each v G V{G), initialize y:^(0) = x^. 

• For each v G V\V{G), initialize y^(0) = 0. 

o In the t + l*'^ iteration, update 



This update is performed by each node v through receiving information from its 
neighbors u in G, where u is a copy of v and neighbors (of v) u are copies of neighbors 
(of v) u. 

4. At the end of the t*^ iteration, each node v produces its estimate as 2yg{t), v G V{G). 

It can be easily verified that since above algorithm is indeed implementing the linear iterative 
algorithm based on P, the e computation time is T^{P) and the total number of communications 
performed is Ce{P). In what follows, for the completeness we bound Tf,{P) and Ge{P)- 



Proof. Here, we need the e-mixing time r(e) based on the total variance distance, and recall its 
definition in Section [2. 4t 



The following relation between two different mixing time T(e) and 7i is known (see [22]): 



y^(t + l) = 5:P^y^(t). 



Lemma 8 T,{P) = O {h{P) log . 
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Metropolis-Hastings 


Pseudo-Lifting 


Optimal 


Mixing time(Running timo) 
: d-dim. grid graph 




0{D) 


D 

: n d 


Size (dbl. dim. p) 
: d-dim. grid graph 


e(n) 

:e(7i) 


O {^n'!^D^ 


n 

: n 


Total # of operations 
: d-dim. grid graph 




O i^TTTD'^^ 


nD 



Table 1 

Comparison of pseudo-lifting with the Metropolis-Hastings method. Here, we assume G has 0(n) edges. 



If t is larger than r(e7ro/4) of P , which is O (7{{P) log 



74 



(2/(0),7f)| 



(a) PTTn (^) 



where (a) is from 



pt. 



< 2 X ^ = and (b) is because ttj > ^ttj > ^ttq 
for every old node j G V{G), and yj(0) = otherwise. This completes the proof. □ 

From the proof of Lemma [HI note that the relation T^^P) = O (tI{P) log 7;^^ holds for any ran- 
dom walk P. Therefore, Ts{P) = O [d log and Ce{P) = Te{P) x\E\ = O (d'^u'^p log 

since 'H{P) = 0(D) and \E\ = 0{Dn i+p) from Lemma [6] and [71 This also completes the proof 
of Theorem [2j 



5.3. Comparison with other algorithms 

Even considering any possible algorithms based on passing messages, the lower bound of the 
performance guarantees in the averaging problem is 0{D) for the running time, and 0{Dn) for 
the total number of operations. Therefore, our algorithm using pseudo-lifting gives the best run- 
ning time, and possibly loses ^ ^^o{Dn)^ ^ ~ 0*{D/n'p+^) factor in terms of the total number of 
operations compared to the best algorithm. For example, when G is a d-dimensional grid graph, 
this loss is only 0*{D/np+^ ) = 0*(7i ' /n^+i ) = 0*(n''('*+i) ) since the doubling dimension of G 
is d and its diameter D is 0{n^^'^). The standard linear iterations using the Metropolis-Hastings 
method loses Q{n^/'^) factor in both the running time and the total number of operations (see 
Table I). 

We take note of the following subtle matter: the non-reversibility is captured in the transition 
probabilities of the underlying Markov chain (or random walk) ; but the linear iterative algorithm 
does not change its form other than this detail. 
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6. Lifting Using Expanders 

We introduced the new notion of pseudo-lifting for the apphcations of interest, one of which was 
the distributed averaging. However, since it may not be relevant to certain other applications, we 
optimize the size of lifting (not pseudo- lifting) in [7] . The basic motivation of our construction is 
using the expander graph, instead of the complete graph in [7], to reduce the size of the lifting. 

6. 1 . Preliminaries 

In what follows, we will consider only P such that P > 1/2. This is without loss of generality 
due to the following reason. Suppose such is not the case, then we can modify it as (I + -P)/2; 
the mixing time of {I + P)/2 is within a constant factor of the mixing time of P. 

6.1.1. Multi- commodity Flows 

In [7], the authors use a multi-commodity flow to construct a specific lifting of a given random 
walk P to speed-up its mixing time. Specifically, they consider a multi-commodity flow problem 
on G with the capacity constraint on edge (u, v) £ E given by Quv A flow from a source s to a 
destination t, denoted by /, is defined as a non-negative function on edges of G so that 

j J 

for every node i ^ s,t. The value of the flow is defined by 

vaiif) = E fisj) - E fij^) = E - E fitj) 

j j j j 

, and the cost of flow is defined as 

cost{f)= E fi^j)- 

A multi- commodity flow is a collection / = of flows, where each is a flow from s to t. 

Define the congestion of a multi-commodity flow / as 

max —— . 

(ij)e£ Qij 

Consider the following optimization problem, essentially trying to minimize the congestion and 
the cost simultaneously under the condition for the amount of flows: 

minimize K 

subject to val{f^^) = TTsTTt, Vs,t, 

Y,r\ij)<KQij, y{i,j)eE, 

s,t 

Ecost(/**) < KtTs, Ecost(/'*) < Knt, Vs,i. 

t s 
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Let C be the optimal solution of the above problem. It is easy to see that C > 1/$. Further, 
if P is reversible, then result of Leighton and Rao [21] on the approximate multi-commodity 
implies that 

C = o(^loR — 



Let the optimal multi-commodity flow of the above problem be Fi, and we can think of Fi as 
a weighted collection of directed paths. In [7], the authors modified Fi, and got a new multi- 
commodity flow F2 that has the same amount of s — t flows as -Fi , while its congestion and path 
length are at most 12C They used F2 to construct a lifting P with mixing time 7i such that 

n < lUC. 

Also, they showed that the mixing time of any lifting P is greater than C/2, hence their lifted 
Markov chain has almost optimal speed-up within a constant factor. 

To obtain a lifting with the smaller size than that in we will to study the existence of 
the specific A;-commodity flow with short path lengths. For this, we will use a balanced multi- 
commodity flow, which is a multi-commodity flow with the following condition for the amount 
of flows: 

val{r') =g{s,t)ys,t, 
and g{s, t) satisfies the balanced condition: 

^g{s,t)<7rs, ^gis,t) <7rt, Vs,t. 
t s 

Therefore, -Fi and F2 are also balanced multi-commodity flows with g{s, t) = tTsT^i- Given a multi- 
commodity flow /, let C{f) be its congestion and D{f) be the length of the longest flow-path. 
Then, the flow number T is deflned follows: 

r = mm(max{C(/),Z)(/)}), 

where the minimum is taken over all balanced multi-commodity flows with g[s, t) = tTsT^i- Hence, 
F2 implies T < 12C. The following claim appears in |19j : 

Claim 9 (Claim 2.2 in \19^ ) For any g{s,t) satisfying the balanced condition (not necessarily 
g{s,t) =TTsTTt), there exists a balanced multi- commodity flow f with g(s,t) such that max{C{f), D{f)} < 
2T. 

6.1.2. Expanders 

The expander graphs are sparse graphs which have high connectivity properties, quantified using 
the edge expansion h{G) as defined as 

h{G) = mm , „, , 
l<|5|<f \S\ 

where d{S) is the set of edges with exactly one endpoint in S. For constants d and c, a family 
Q = {Gi,G2, ■ ■ ■} of d-regular graphs is called a (d, c)-expander family if h{G) > c for every 
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G € Q. There are many explicit constructions of a {d, c)-expander family available in recent 
times. We will use a (d, c)-expander graph G^^ = {V,E^^) (i.e. V^^ = V), and a transition 
matrix P^^ defined on this graph. For a given tt, we can define a reversible P^^ so that its 
stationary distribution is vr as follows, 

In the case of TT^ax = 0{tto), it is easy to check that ^>(P'^^) = Q{h{G)) = 0(1), where $(P^^) 
is the conductance of P^^. Hence, XpEx = ^2(1), and the random walk defined by P^'^ mixes 
fast. In this Section, we will consider only such vr. 

6.2. Construction 

We use the multi-commodity flow based construction which was introduced in [7j. They essen- 
tially use a multi-commodity flow between source-destination pairs for all s,t £V. Instead, we 
will use a balanced multi-commodity flow between source-destination pairs that are obtained 
from an expander. Thus, the essential change in our construction is the use of an expander 
in place of a complete graph used in [7] . A caricature of this lifting is explained in Figure [H 
However, this change makes the analysis of the mixing time lot more challenging and requires 
us to use different analysis techniques. Further, we use arguments based on the classical linear 
programming to derive the bound on the size of lifting. 




Lifting G using tlie complete graph 



Fig 4. A caricature of lifting using expander. Let line graph G be a line graph with 4 nodes. We wish to use an 
expander G^^ with 4 nodes, shown on the top-right side of the figure. G is lifted by adding paths that correspond 
to edges of expander. For example, an edge (2,4) of expander is added as path (2, 3', 4). We also draw the lifting 
in ^ which uses the complete graph. 



To this end, we consider the following multi-commodity flow: let G^^ = {V, E^^) be an 
expander with a transition matrix P^^ and a stationary distribution vr as required - this is 
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feasible since we have assumed -Kmax = 0(7ro). We note that this assumption is used only for the 
existence of expanders. Consider a multi-commodity flow / = {f^^)(^s,t)eEB^ ^o that 

(a) valin = tTsPS^ = Qff,y{s,t) G E^'-; 

(b) J:s,trKij)<KQ^„y{i,j)€E■, 

Lemma 10 There is a feasible -multi- commodity flow in the above flow problem with congestion(K ) 
and path-length at most W , where W = 0*{1/^{P)). 

Proof. The conclusion is derived directly from Claim [9] since the flow number T is less than 
12C = 0*(1/<I>(P)) and the flow considered is a balanced multi-commodity flow i.e. W = 24C = 
0*(1/$(P)). □ 

Now, we can think of this multi-commodity flow as a weighted collection of directed paths 
{(J^r^Wr) : 1 < r < A^}, where the total weight of paths from node s to i is vr^Pj'^, where 
(s,t) G E^^ . Let be the length of path Vr- From Lemma [T0| we have the following: 

Y,Wr = l, ir<W, (9) 

r 

^ Wr = TTi, ^ Wr = VTj, for i G F (10) 

r:Vr starts at i r-.Vr ends at i 

J2 Wr< WQ^,, for G E. (11) 

r:{i,j)£E{Vr) 

Using such a collection of weighted paths, we construct the desired lifting next. As Figure 
m we construct the lifted graph G = {V, E) from G by adding a directed path of length 1^. 
connecting i to j if Vr goes from i to j. Subsequently, ir — 1 new nodes are added to the original 
graph. The ergodic flow on an edge {i,j) of the lifted chain is defined by 



Q 



Wr/2W ii{i,j)€E{V^.) 

Q'^J - Er-.ijmVr) ^r/2W if (i,i) G EiG) 



It is easy to check it defines a Markov chain on G, and a natural way of mapping the paths V'^. 
onto the paths Vr collapses the random walk on G onto the random walk on G. The stationary 
distribution of the lifted chain is 



vr,; 



Wr/2W if i G V(V'r)\V{G) 

7^-E..:P.thru^W2^^ if i G y(G) 

Thus, the above stated construction is a valid lifting of the given Markov chain P defined on G. 



6.3. Mixing time and size: Proof of Theorem\^ 

We prove two Lemmas about the performance of lifting we constructed, and they imply Theorem 
[3l At first, we state and prove the lemma which bounds the mixing time of the lifted chain we 
constructed. 
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Lemma 11 The mixing time 7i of the lifted Markov chain represented by Q defined on G is 
0*(1/$(P))H 

Proof. By the property of expanders, we have Xpsx = 0(1). Therefore, it is sufficient to show 

^ f W 1 
W = fog- 

\ApEx TTq 

First, note that for any node i £V (i.e. a original node i in G), 

1 ^ 

-jT^i < TTi < TTj. (12) 

Now, under the lifted Markov chain the probabiHty of getting on any directed path V'^, starting 
at i is _ 

P^, Qi'j 'f^r 

7, ^' 



Hence the probabihty of getting on any directed path starting at i is 

r:P' starts at i r:Vl. starts at i 



From (fT2]) . this is bounded between and -^7. 

To study the 7i, we wih focus on the induced random walk (or Markov chain) on original 
nodes y C ^ by the lifted Markov chain P. Let P^ be the transition matrix of this induced 
random walk. Then, 



Wr 



r : P'^ goes irom « to j 

Now, > P > 1/4, because Pa = Qu/Tii > QallT^i = Pii'iTi/2Tfi > Pii/2 > 1/4. Here we have 
assumed that P > I /2 as discussed earlier. Now, 

1 ttP^^ 1 

- 2WTii ^, V ■ - 2W ■ 

r:/-'^ goes from « to j 

And, its stationary distribution vf^ is : ttJ = '^Jy^- Therefore, by (fT^ we have ivTj < vr]^ < 27rj. 
Now, we can apply Claim [T^ to obtain the following: 



Now, we are ready to design the following stopping rule F that will imply that the desired bound 
on 7i. 

(i) Walk until visiting old nodes of ^ C F for T times, where T := 2log{2/7fQ)/Xpy^pyy^ 
Let this T*'' old node be denoted by X. 

(ii) Stop at X with probability 1/2. 



"The precise bound is 0(M^log 
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(iii) Otherwise, continue walking until getting onto any directed path V'^.] choose an interior 
node Y of V'^. uniformly at random and stop at Y . 

From the relation ([2]) in Section 12.41 with £ = \ \[^ ^ it follows that after time T as defined 
above the Markov chain P^, restricted to old nodes V , has distribution close to i.e. 

I Pr(X = w)- vf^l < 7f^/2, y weV. 

According to the above stopping rule, we stop at an old node w with probability 1/2. Therefore, 
for any G F, we have that the stopping time V stops at w with probability at least 7f^/4 > 
7i"w)/8 > With probability 1/2, the rule does not stop at the node X. Let be the k^^ 

point in the walk starting from X. Because at any old node i, the probability of getting on any 
directed path is between and a coupling argument shows that for any old node i, 

( 1 \ ^ 1 

Vy{w^ = i\nP , •, are old nodes) > 1 - 777 I -vff 



W ) 2 

If tf; is a new point on the directed path V' r which connects the old node i to j. Then, 

-j^ 00 

Pr(r stop at ij;) > - ^ Prob(«;'^ = i\w^, -,10^ are old points) 



^ fc=o 



X Prob(at i, get on the path V'r) x -j- 



1 / 1 y l^V Wr 1 



> 



k=0 

k 



Wr 



16W 
1_ 



The average length of this stopping rule is 0{T -\- W). By (jl3p . 

2 



0{T + W) = O 



log(2/^o) 



W]=0[ log(l/7ro: 



Thus, we have established that the stopping rule F has the average length 0(VFlog I/ttq) and 
the distribution of the stopping node is n[Ti). Therefore, using the fill-up lemma stated in [1], it 
follows that n = 0{W log l/vro). □ 

Also, we bound the size of the lifted chain we constructed as follows. 

Lemma 12 The size of the lifted Markov chain can he hounded ahove as 0*{\E\/^{P)) 0. 



^The precise bound is 0{\E\W). 
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Proof. We want to establish that the size of the hfted chain in terms of the number of edges, i.e. 
\E\ = 0*{\E\/^{P)). Note that, the hfted graph G is obtained by adding paths that appeared 
in the solution of the multi-commodity flow problem. Therefore, to establish the desired bound 
we need to establish a bound on the number of distinct paths as well as their lengths. 

To this end, let us re-formulate the multi-commodity flow based on expander G^^ as follows. 
For each {s,t) G E^^, we add a flow between s and t. Let this flow be routed along possibly 
multiple paths. Let Pstj denote the j*'' path from s to t and Xgtj be the amount of flow sent 
along this path. The length igtj of Pgtj is at most W as the discussion in Lemma [TOj Let the 
overall solution, denoted by {{VryWr)}, gives a feasible solution in the following polytope with 



Estj as its variables: 



j 



E E Xstj<WQ,, ye€E 

Xstj > ys,t,j. 

Clearly, any feasible solution in this polytope, say {{Vr, Wj-)}, will work for our lifting construc- 
tion. Now, the size of its support set is \{{T'r,'Wr)}\- If we consider the extreme point of this 
polytope, the size of its support set is at most \E^^\ + \E\ = 0{\E\) because the extreme point 
is an unique solution of a sub-collection of linear constraints in this polytope. Hence, if we 
choose such an extreme point {{Vr, w^)} for our lifting, the size of our lifted chain \E\ is at most 
©(Vrli^l) since each path is of length 0{W). Thus, we have established that the size of the lifted 
Markov chain is at most 0{W\E\) = 0*{\E\/<^{P)). □ 

6.4- Useful Claims 

We state and prove two useful claims which plays a key role in proving Lemma [TTl 

Claim 13 Let Pi,i-*2 be reversible Markov chains with their stationary distributions 7ri,7r2 re- 
spectively. If there exist positive constants a,f3,c,d such that Pi > aP2, Pi > pi and cn2 < 
TTl < dTr2, then 

( OLC 

Apj > min ( — Ap2,2/3 

Proof. From the min-max characterization of the spectral gap (see, e.g., the page 176 in |14j ) 
for the reversible Markov chain, it follows that 



Apj = inf 



E^,,w(V'(^)-V'(J))'(vrl).(Pl)^, 



^:y^M \^ Ei,i6F(V'(^) - ^(i))2(vri)i(7ri)j ) 

The smallest eigenvalue of Pi is greater than 2(3 — 1 because Pi > (31. So, the distance between 
the smallest eigenvalue and -1 is greater than 2(3. This completes the proof. □ 
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Claim 14 Let Pi,P2 he Markov chains with their stationary distributions 7ri,7r2 respectively. 
Now, suppose P2 is reversible. (Pi is not necessarily reversible.) If there exist positive constants 
a,P,c,d such that Pi > aP2, Pi > (31 and c'K2 < tti < d'j:2, then 



Proof. PiPi is a reversible Markov chain which has vri as its stationary distribution. Because 
Pi* > (31, PiPl > aP2Pi > apP2. Also, PiP^ > P^I. Now, the proof follows from Claim [131 □ 

7. Conclusion 

Motivated by applications arising in emerging networks such as sensor networks, peer-to-peer 
networks and surveillance network of unmanned vehicles, we consider the question of designing 
fast linear iterative algorithms for computing the average of numbers in a network. We pre- 
sented a novel construction of such an algorithm by designing the fastest mixing non-reversible 
Markov chain on any given graph. Our Markov chain obtained through a new notion denoted 
by pseudo-lifting. We apply our constructions to graphs with geometry, or graphs with doubling 
dimension. By using their topological properties explicitly, we obtain fast and slim pseudo-lifted 
Markov chains. The effectiveness (and optimality) of our constructions are explained through 
various examples. As a byproduct, our result provides the fastest mixing Markov chain for any 
given graph which should be of interest in its own right. Our result should naturally find their 
applications in the context of distributed optimization, estimation and control. 

We note that the pseudo-lifting presented here is based on a two-level "hierarchical star" 
topology. This construction is less robust to node failures. For example, failure of "root" node can 
increase the mixing time drastically. To address this, one may alternatively use a "hierarchical 
expander" based pseudo-lifting. That is, in place of the "star" topology in the pseudo-lifting, 
utilize the "expader" topology. This will naturally make the construction more robust without 
loss of performance. Of course, this will complicate the mixing time analysis drastically. This is 
where our method developed in the expander-based lifting will be readily useful. 
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